Nick/create problem #442

ntjohnson1 · 2025-06-28T16:11:09Z

It would be nice to land #432 first since that resolves all of our doc warnings. I didn't fully merge it here so if that merges first I probably need to review this to make sure I didn't break any new stuff. Alternatively if we want to land this first I can revert the few unrelated commits (they made local iteration cleaner to avoid whatever mac precision thing I'm hitting).

This resolves #312. I added tests but hard to say if things are totally correct. For the sparse generation example in the tutorial I never fully groked that before and didn't dig into it now but the results all seem to look reasonable. There will probably be follow on after this to actual move usages over to this implementation, related to #363. This ended up being a large task than expected so punting on that for now.

Style things:

Maybe we need a style doc
Matlab has these big arbitrary dictionaries to capture outputs then liberal use of varargin. I haven't been super consistent on this but dataclasses seem nice for some of this and that is the direction I went here
- The CPProblem vs ExistingCPSolution is a little clunky, but pretty clear/explicit. If I added a solution field to the CPProblem then we'd have a bit of a mess with default values and clarity for when the user want to generate things or not. Open to opinions about the usability
If our inputs are clear and unambiguous I don't think returning the input parameters in the output makes sense.

📚 Documentation preview 📚: https://pyttb--442.org.readthedocs.build/en/442/

python/mypy#18464

…roblem

dmdunla · 2025-08-28T15:38:05Z

@ntjohnson1 Thanks, what a major undertaking!

This is a departure from the TTB for MATLAB implementation, in that 1) you need to choose a Problem class, add parameterization to that, and then instantiate a (data, solution) pair; and 2) missing data pararmeterization is handled outside of the Problem class in the function signature, leading to two inputs as opposed to one in TTB (e.g., passing params again to make an identical copy).

Mapping to the TTB implementation, I now see the challenges, as there are so many options and combinations of options. So, I am OK with this for now. I will explicitly reach out to users to request feedback once this lands in its initial form.

pyttb/create_problem.py

ntjohnson1 · 2025-08-28T19:36:37Z

I will explicitly reach out to users to request feedback once this lands in its initial form.

Open to feedback or a design discussion if people have thoughts. I had a little bit of consideration when I put this together but biased towards trying to make it more explicit/clear on expectations which might hurt usability a little. I'm sure there's at least some improvments for the ergonomics.

dmdunla · 2025-08-29T20:47:51Z

@ntjohnson1 Everything seems to align with TTB for MATLAB, except the Sparse Problem creation. The number of nonzeros seems to be significantly lower in pyttb. When I ask for 50% sparsity, TTB for MATLAB return num_nozeros near 0.5*size(S) [just a bit lower, but you can estimate this using sampling with replacement and it is predicted to be lower than 50%]. However, in pyttb, the number is often off by a factor of ~3 (or a bit more):

sz = [20 15 10];
nf = 4;
A = cell(3,1);
for n = 1:length(sz)
    A{n} = rand(sz(n), nf);
    for r = 1:nf
        p = randperm(sz(n));
        idx = p(1:round(.2*sz(n)));
        A{n}(idx,r) = 10 * A{n}(idx,r);
    end
end
S = ktensor(A);
S = normalize(S,'sort',1);

info = create_problem('Soln', S, 'Sparse_Generation', 0.5);
num_nonzeros = nnz(info.Data)
total_insertions = sum(info.Data.vals)

num_nonzeros =

   718


total_insertions =

        1500

shape = (20, 15, 10)
num_factors = 4
A = []
for n in range(len(shape)):
    A.append(np.random.rand(shape[n], num_factors))
    for r in range(num_factors):
        p = np.random.permutation(np.arange(shape[n]))
        idx = p[1 : round(0.2 * shape[n])]
        A[n][idx, r] *= 10
S = ttb.ktensor(A)
S.normalize(sort=True);

existing_params = ExistingCPSolution(S, noise=0.0, sparse_generation=0.5)
solution, data = create_problem(existing_params)
print(
    f"num_nozeros: {data.nnz}\n"
    f"total_insertions: {np.sum(data.vals)}\n"
)

num_nozeros: 258
total_insertions: 1500.0

I am willing to merge for now and then add an Issue to address this specific discrepancy. What do you think of that approach?

ntjohnson1 · 2025-08-29T21:04:32Z

I am willing to merge for now and then add an Issue to address this specific discrepancy. What do you think of that approach?

Sounds reasonable to me or we could just disable sparse generation for now. That setting is the only thing that routes through generate_data_sparse which appears to be broken.

dmdunla · 2025-08-29T21:13:26Z

Sounds reasonable to me or we could just disable sparse generation for now. That setting is the only thing that routes through generate_data_sparse which appears to be broken.

Thanks, I'll merge it and add a new Issue for sparse_generation.

…d in the example (as it is optional).

ntjohnson1 · 2025-08-29T21:29:53Z

If you have more bandwidth to look at this it looks like I normalized the factor matrices incorrectly so the probabilities summed to more than 1 which caused the issue. With your sample script this yields much more reasonable results. (~700-740). But I'm also fine to take this over to the followup

diff --git a/pyttb/create_problem.py b/pyttb/create_problem.py
index 36d1c97..f14d2a6 100644
--- a/pyttb/create_problem.py
+++ b/pyttb/create_problem.py
@@ -593,7 +593,7 @@ def generate_data_sparse(
 
     # Convert solution to probability tensor
     # NOTE: Make copy since normalize modifies in place
-    P = solution.copy().normalize(mode=0)
+    P = solution.copy().normalize(normtype=1)
     eta = np.sum(P.weights)
     P.weights /= eta

dmdunla · 2025-08-29T21:41:23Z

If you have more bandwidth to look at this it looks like I normalized the factor matrices incorrectly so the probabilities summed to more than 1 which caused the issue. With your sample script this yields much more reasonable results. (~700-740). But I'm also fine to take this over to the followup

P = solution.copy().normalize(mode=0)

P = solution.copy().normalize(normtype=1)

Got it, this makes sense. Thanks for the catch! I'll make the change.

…or Sparse Problem.

ntjohnson1 added 20 commits May 23, 2025 17:04

Remove stubs

ae8cc9f

Fix rst keyword typo

fb6f006

Handle numerical precision error seen locally

c6e091e

Add coverage for our missed doc components

cc17f7c

Add python 3.13 but only do coveralls for oldest supported

54c12c6

Bulk on non-missing data support

ab81d8c

Small cleanup and improv some testing

476eab7

Add basic support for sparse_generation

ff2311b

Fix a few comments

12cf067

Minor improvement on collapse typing

8cd301a

Preliminary implementation for missing data generation, minimal testing

6033924

Add further smoke tests and tutorial notebook

b014292

Fix ttensor doc string

d2725a3

Fix coverage in overload

aeb52fc

Add minimal validation to sptensor to avoid footgun

a0d8512

Clean up create problem documentation

9a0aef7

Extend some test converage

2444a1a

Add existing solution support and updated docs

2f10f55

Fix nbstripout

9d31b15

Update mypy to grab PR fixing 3.13 dataclass error:

f7ef55a

python/mypy#18464

ntjohnson1 requested a review from dmdunla June 28, 2025 16:31

ntjohnson1 and others added 2 commits August 28, 2025 11:09

Merge branch 'main' of github.com:ntjohnson1/pyttb into nick/create_p…

76dcac3

…roblem

Merge branch 'main' into nick/create_problem

83e01c3

dmdunla reviewed Aug 28, 2025

View reviewed changes

pyttb/create_problem.py Show resolved Hide resolved

dmdunla and others added 2 commits August 28, 2025 19:02

Merge branch 'main' into nick/create_problem

7150bf0

Add sparse count data to doc string

da2b2d8

Minor fixes. Removing parameters for missing data unless actually use…

6270563

…d in the example (as it is optional).

Minor edits to text.

cbab7cc

Fixing normalization to get the probabilities correct when sampling f…

6f58786

…or Sparse Problem.

dmdunla merged commit 843cba0 into sandialabs:main Aug 29, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nick/create problem #442

Nick/create problem #442

Uh oh!

ntjohnson1 commented Jun 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

dmdunla commented Aug 28, 2025

Uh oh!

Uh oh!

ntjohnson1 commented Aug 28, 2025

Uh oh!

dmdunla commented Aug 29, 2025

Uh oh!

ntjohnson1 commented Aug 29, 2025

Uh oh!

dmdunla commented Aug 29, 2025

Uh oh!

ntjohnson1 commented Aug 29, 2025

Uh oh!

dmdunla commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

Nick/create problem #442

Nick/create problem #442

Uh oh!

Conversation

ntjohnson1 commented Jun 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmdunla commented Aug 28, 2025

Uh oh!

Uh oh!

ntjohnson1 commented Aug 28, 2025

Uh oh!

dmdunla commented Aug 29, 2025

Uh oh!

ntjohnson1 commented Aug 29, 2025

Uh oh!

dmdunla commented Aug 29, 2025

Uh oh!

ntjohnson1 commented Aug 29, 2025

Uh oh!

dmdunla commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

ntjohnson1 commented Jun 28, 2025 •

edited by github-actions bot

Loading